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Abstract 

This paper reports on the LEARNING 
Computational Grammars (LCG) 
project, a postdoc network devoted to 
studying the application of machine 
learning techniques to grammars suit- 
able for computational use. We were in- 
terested in a more systematic survey to 
understand the relevance of many fac- 
tors to the success of learning, esp. the 
availability of annotated data, the kind 
of dependencies in the data, and the 
availability of knowledge bases (gram- 
mars). We focused on syntax, esp. noun 
phrase (NP) syntax. 



1 Introduction 

This paper reports on the still preliminary, but al- 
ready satisfying results of the LEARNING COM- 
PUTATIONAL Grammars (LCG) project, a post- 
doc network devoted to studying the application 
of machine learning techniques to grammars suit- 
able for computational use. The member insti- 
tutes are listed with the authors and also included 
ISSCO at the University of Geneva. We were im- 
pressed by early experiments applying learning 
to natural language, but dissatisfied with the con- 
centration on a few techniques from the very rich 
area of machine learning. We were interested in 
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a more systematic survey to understand the rele- 
vance of many factors to the success of learning, 
esp. the availability of annotated data, the kind 
of dependencies in the data, and the availability 
of knowledge bases (grammars). We focused on 
syntax, esp. noun phrase (NP) syntax from the 
beginning. The industrial partner, Xerox, focused 
on more immediate applications (Cancedda and 
Samuelsson, 2000). 

The network was focused not only by its sci- 
entific goal, the application and evaluation of 
machine-learning techniques as used to learn nat- 
ural language syntax, and by the subarea of syn- 
tax chosen, NP syntax, but also by the use of 
shared training and test material, in this case ma- 
terial drawn from the Penn Treebank. Finally, we 
were curious about the possibility of combining 
different techniques, including those from statisti- 
cal and symbolic machine learning. The network 
members played an important role in the organi- 
sation of three open workshops in which several 
external groups participated, sharing data and test 
materials. 

2 Method 

This section starts with a description of the three 
tasks that we have worked on in the framework of 
this project. After this we will describe the ma- 
chine learning algorithms applied to this data and 
conclude with some notes about combining dif- 
ferent system results. 

2.1 Task descriptions 

In the framework of this project, we have worked 
on the following three tasks: 

1. base phrase (chunk) identification 

2. base noun phrase recognition 

3. finding arbitrary noun phrases 



Text chunks are non-overlapping phrases which 
contain syntactically related words. For example, 
the sentence: 

[np He ] [vp reckons ] [np the current 
account deficit ] [yp will narrow ] 
[pp to ] [np only £ 1.8 billion ] 
[pp in ] [np September ] . 

contains eight chunks, four NP chunks, two VP 
chunks and two PP chunks. The latter only con- 
tain prepositions rather than prepositions plus the 
noun phrase material because that has already 
been included in NP chunks. The process of 
finding these phrases is called CHUNKING. The 
project provided a data set for this task at the 
CoNLL-2000 workshop (Tjong Kim Sang and 
Buchholz, 2000)[j. It consists of sections 15-18 of 
the Wall Street Journal part of the Penn Treebank 



II ( |Marcus et al., 1993| ) as training data (211727 
tokens) and section 20 as test data (47377 tokens). 
A specialised version of the chunking task is 
NP CHUNKING or baseNP identification in which 
the goal is to identify the base noun phrases. The 
first work on this topic was done back in the 



eighties dChurch, 1988| ). The data set that has 
become standard for evaluation machine learn- 
ing approaches is the one first used by Ramshaw 



and Marcus (|1995J). It consists of the same train- 
ing and test data segments of the Penn Treebank 
as the chunking task (respectively sections 15-18 
and section 20). However, since the data sets 
have been generated with different software, the 
NP boundaries in the NP chunking data sets are 
slightly different from the NP boundaries in the 
general chunking data. 

Noun phrases are not restricted to the base lev- 
els of parse trees. For example, in the sentence In 
early trading in Hong Kong Monday , gold was 
quoted at $ 366.50 an ounce ., the noun phrase 
[np $ 366.50 an ounce ] contains two embedded 
noun phrases [np $ 366.50 ] and [np an ounce ]. 
In the NP BRACKETING task, the goal is to find 
all noun phrases in a sentence. Data sets for this 
task were defined for CoNLL-99f[| The data con- 
sist of the same segments of the Penn Treebank as 

1 Detailed information about chunking, the CoNLL- 
2000 shared task, is also a vailable at http://lcg- 
www.uia.ac.be/conll2000/chunking/^ 

Information about NP bracket ing can be found at 

http://lcg-www.uia.ac.be/conll99/npb^ 



the previous two tasks (sections 15-18) as train- 
ing material and section 20 as test material. This 
material was extracted directly from the Treebank 
and therefore the NP boundaries at base levels are 
different from those in the previous two tasks. 

In the evaluation of all three tasks, the accu- 
racy of the learners is measured with three rates. 
We compare the constituents postulated by the 
learners with those marked as correct by experts 
(gold standard). First, the percentage of detected 
constituents that are correct (precision). Second, 
the percentage of correct constituents that are de- 
tected (recall). And third, a combination of pre- 
cision and recall, the F^ = i rate which is equal to 
(2*precision*recall)/(precision+recall). 

2.2 Machine Learning Techniques 

This section introduces the ten learning meth- 
ods that have been applied by the project 
members to the three tasks: LSCGs, ALLiS, 
LSOMMBL, Maximum Entropy, Aleph, MDL- 
based DCG learners, Finite State Transducers, 
ibIig, IGTree and C5.0. 

Local Structural Context Grammars 
(LSCGs) (Belz, 20011) are situated between 



conventional probabilistic context-free produc- 
tion rule grammars and DOP-Grammars (e.g., 
Bod and Scha ( |1997[ )). LSCGs outperform the 



former because they do not share their inher- 
ent independence assumptions, and are more 
computationally efficient than the latter, because 
they incorporate only subsets of the context 
included in DOP-Grammars. Local Structural 
Context (LSC) is (partial) information about the 
immediate neighbourhood of a phrase in a parse. 
By conditioning bracketing probabilities on LSC, 
more fine-grained probability distributions can be 
achieved, and parsing performance increased. 

Given corpora of parsed text such as the WSJ, 
LSCGs are used in automatic grammar construc- 
tion as follows. An LSCG is derived from the cor- 
pus by extracting production rules from bracket- 
ings and annotating the rules with the type(s) of 
LSC to be incorporated in the LSCG (e.g. parent 
category information, depth of embedding, etc.). 
Rule probabilities are derived from rule frequen- 
cies (currently by Maximum Likelihood Estima- 
tion). In a separate optimisation step, the resulting 
LSCGs are optimised in terms of size and pars- 



ing performance for a given parsing task by an 
automatic method (currently a version of beam 
search) that searches the space of partitions of a 
grammar's set of nonterminals. 

The LSCG research efforts differ from other 
approaches reported in this paper in two respects. 
Firstly, no lexical information is used at any point, 
as the aim is to investigate the upper limit of pars- 
ing performance without lexicalisation. Secondly, 
grammars are optimised for parsing performance 
and size, the aim being to improve performance 
but not at the price of arbitrary increases in gram- 
mar complexity (hence the cost of parsing). The 
automatic optimisation of corpus-derived LSCGs 
is the subject of ongoing research and the results 
reported here for this method are therefore pre- 
liminary. 

Theory Refinement (ALLiS). ALLiS 



(( Pejean, 2000b| ), ( Pejean, 2000c| )) is a in- 
ductive rule-based system using a traditional 
general-to-specific approach ( [Mitchell, 1997 ). 
After generating a default classification rule 
(equivalent to the n-gram model), ALLiS tries 
to refine it since the accuracy of these rules is 
usually not high enough. Refinement is done 
by adding more premises (contextual elements). 
ALLiS uses data encoded in XML, and also 
learns rules in XML. From the perspective of the 
XML formalism, the initial rule can be viewed 
as a tree with only one leaf, and refinement is 
done by adding adjacent leaves until the accuracy 
of the rule is high enough (a tuning threshold 
is used). These additional leaves correspond to 
more precise contextual elements. Using the 
hierarchical structure of an XML document, 
refinement begins with the highest available 
hierarchical level and goes down in the hierarchy 
(for example, starting at the chunk level and then 
word level). Adding new low level elements 
makes the rules more specific, increasing their 
accuracy but decreasing their coverage. After 
the learning is completed, the set of rules is 
transformed into a proper formalism used by a 
given parser. 

Labelled SOM and Memory Based Learn- 
ing (LSOMMBL) is a neurally inspired technique 
which incorporates a modified self-organising 
map (SOM, also known as a 'Kohonen Map') in 
memory-based learning to select a subset of the 



training data for comparison with novel items. 
The SOM is trained with labelled inputs. Dur- 
ing training, each unit in the map acquires a la- 
bel. When an input is presented, the node in the 
map with the highest activation (the 'winner') is 
identified. If the winner is unlabelled, then it ac- 
quires the label from its input. Labelled units 
only respond to similarly labelled inputs. Other- 
wise training proceeds as with the normal SOM. 
When training ends, all inputs are presented to 
the SOM, and the winning units for the inputs 
are noted. Any unused units are then discarded. 
Thus each remaining unit in the SOM is associ- 
ated with the set of training inputs that are closest 
to it. This is used in MBL as follows. The labelled 
SOM is trained with inputs labelled with the out- 
put categories. When a novel item is presented, 
the winning unit for each category is found, the 
training items associated with the winning units 
are searched for the closest item to the novel item 
and the most frequent classification of that item is 
used as the classification for the novel item. 

Maximum Entropy When building a classi- 
fier, one must gather evidence for predicting the 
correct class of an item from its context. The 
Maximum Entropy (MaxEnt) framework is espe- 
cially suited for integrating evidence from var- 
ious information sources. Frequencies of evi- 
dence/class combinations (called features) are ex- 
tracted from a sample corpus and considered to be 
properties of the classification process. Attention 
is constrained to models with these properties. 
The MaxEnt principle now demands that among 
all the probability distributions that obey these 
constraints, the most uniform is chosen. During 
training, features are assigned weights in such a 
way that, given the MaxEnt principle, the train- 
ing data is matched as well as possible. During 
evaluation it is tested which features are active 
(i.e., a feature is active when the context meets 
the requirements given by the feature). For every 
class the weights of the active features are com- 
bined and t he best scoring class is chosen (B erger 
et al., 1996). For the classifier built here we use 
as evidence the surrounding words, their POS tags 
and baseNP tags predicted for the previous words. 
A mixture of simple features (consisting of one 
of the mentioned information sources) and com- 
plex features (combinations thereof) were used. 



The left context never exceeded 3 words, the 
right context was maximally 2 words. The model 
was calculated using existing software (D ehaspe, 
1997). 

Inductive Logic Programming (ILP) Aleph 
is an ILP machine learning system that searches 
for a hypothesis, given positive (and, if avail- 
able, negative) data in the form of ground Prolog 
terms and background knowledge (prior knowl- 
edge made available to the learning algorithm) 
in the form of Prolog predicates. The system, 
then, constructs a set of hypothesis clauses that 
fit the data and background as well as possible. 
In order to approach the problem of NP chunk- 
ing in this context of single-predicate learning, it 
was reformulated as a tagging task where each 
word was tagged as being 'inside' or 'outside' a 
baseNP (consecutive NPs were treated appropri- 
ately). Then, the target theory is a Prolog program 
that correctly predicts a word's tag given its con- 
text. The context consisted of PoS tagged words 
and syntactically tagged words to the left and PoS 
tagged words to the right, so that the resulting tag- 
ger can be applied in the left-to-right pass over 
PoS -tagged text. 

Minimum Description Length (MDL) Esti- 
mation using the minimum description length 
principle involves finding a model which not only 
'explains' the training material well, but also is 
compact. The basic idea is to balance the gener- 
ality of a model (roughly speaking, the more com- 
pact the model, the more general it is) with its spe- 
cialisation to the training material. We have ap- 
plied MDL to the task of learning broad-covering 
definite-clause grammars from either raw text, or 



preting probabilistic automata as transducers. We 
use a probabilistic grammatical algorithm, the 



else from parsed corpora (Osborne, 1999a). Pre- 



liminary results have shown that learning using 
just raw text is worse than learning with parsed 
corpora, and that learning using both parsed cor- 
pora and a compression-based prior is better than 
when learning using parsed corpora and a uniform 
prior. Furthermore, we have noted that our in- 
stantiation of MDL does not capture dependen- 
cies which exist either in the grammar or else in 
preferred parses. Ongoing work has focused on 
applying random field technology (maximum en- 
tropy) to MDL-based grammar learning (see Os- 



DDSM algorithm ( fThollard, 200 1| ), for learning 
automata that provide the probability of an item 
given the previous ones. The items are described 
by bigrams of the format featurexlass. In the re- 
sulting automata we consider a transition labeled 
featurexlass as the transducer transition that takes 
as input the first part (feature) of the bigram and 
outputs the second part (class). By applying the 
Viterbi algorithm on such a model, we can find 
out the most probable set of class values given an 
input set of feature values. As the DDSM algo- 
rithm has a tuning parameter, it can provide many 
different automata. We apply a majority vote over 
the propositions made by the so computed au- 
tomata/transducers for obtaining the results men- 
tioned in this paper. 

Memory-based learning methods store all 
training data and classify test data items by giving 
them the classification of the training data items 
which are most similar. We have used three differ- 
ent algorithms: the nearest neighbour algorithm 
IB 1 IG, which is part of the Timbl software pack- 
age ( Daelemans et al., 1999| ), the decision tree 
learner IGTree, also from Timbl, and C5.0, a 
commercial version of the decision tree learner 



borne ( |2000aj ) for some of the issues involved). 
Finite State Transducers are built by inter- 



C4.5 dQuinlan, 1993h . They are classifiers which 
means that they assign phrase classes such as I 
(inside a phrase), B (at the beginning of a phrase) 
and O (outside a phrase) to words. In order to 
improve the classification process we provide the 
systems with extra information about the words 
such as the previous n words, the next n words, 
their part-of-speech tags and chunk tags estimated 
by an earlier classification process. We use the de- 
fault settings of the software except for the num- 
ber of examined nearest neighbourhood regions 
for IB llG (k, default is 1) which we set to 3. 

2.3 Combination techniques 

When different systems are applied to the same 
problem, a clever combination of their results will 
outperform all of the individual results (Diette- 
rich, 1997). The reason for this is that the systems 
often make different errors and some of these er- 
rors can be eliminated by examining the classifi- 
cations of the others. The most simple combina- 
tion method is MAJORITY VOTING. It examines 



the classifications of the test data item and for 
each item chooses the most frequently predicted 
classification. Despite its simplicity, majority vot- 
ing has found to be quite useful for boosting per- 
formance on the tasks that we are interested in. 

We have applied majority voting and nine other 
combination methods to the output of the learning 
systems that were applied to the three tasks. Nine 
combination methods were originally suggested 
by Van Halteren et al. (1998). Five of them, 



including majority voting, are so-called voting 
methods. Apart from majority voting, all assign 
weights to the predictions of the different systems 
based on their performance on non-used train- 
ing data, the tuning data. TotPrecision uses 
classifier weights based on their accuracy. Tag- 
PRECISION applies classification weights based 
on the accuracy of the classifier for that classi- 
fication. Precision-Recall uses classification 
weights that combine the precision of the classi- 
fication with the recall of the competitors. And 
finally, TagPair uses classification pair weights 
based on the probability of a classification for 
some pred icted classification pair ( van Halteren 
et al., 1998). 

The remaining four combination methods are 
so-called stacked classifiers. The idea is to 
make a classifier process the output of the indi- 
vidual systems. We used the two memory-based 
learners IB 1 IG and IGTree as stacked classifiers. 



Like Van Halteren et al. ( |1998j ), we evaluated two 
features combinations. The first consisted of the 
predictions of the individual systems and the sec- 
ond of the predictions plus one feature that de- 
scribed the data item. We used the feature that, 
according to the memory-based learning metrics, 
was most relevant to the tasks: the part-of-speech 
tag of the data item. 

In the course of this project we have evalu- 
ated another combination method: BEST-N MA- 



JORITY voting ( fTjong Kim Sang et al., 2000| ). 
This is similar to majority voting except that in- 
stead of using the predictions of all systems, it 
uses only predictions from some of the systems 
for determining the most probable classifications. 
We have experienced that for different reasons 
some systems perform worse than others and in- 
cluding their results in the majority vote decreases 
the combined performance. Therefore it is a good 



idea to evaluate majority voting on subsets of all 
systems rather than only on the combination of all 
systems. 

Apart from standard majority voting, all com- 
bination methods require extra data for measur- 
ing their performance which is required for de- 
termining their weights, the tuning data. This 
data can be extracted from the training data or the 
training data can be processed in an n-fold cross- 
validation process after which the performance on 
the complete training data can be measured. Al- 
though some work with individual systems in the 
project has been done with the goal of combining 
the results with other systems, tuning data is not 
always available for all results. Therefore it will 
not always be possible to apply all ten combina- 
tion methods to the results. In some cases we have 
to restrict ourselves to evaluating majority voting 
only. 

3 Results 

This sections presents the results of the different 
systems applied to the three tasks which were cen- 
tral to this this project: chunking, NP chunking 
and NP bracketing. 



3.1 Chunking 

Chunking was the shared task of CoNLL-2000, 
the workshop on Computational Natural Lan- 
guage Learning, held in Lisbon, Portugal in 2000 
( Tjong Kim Sang and Buchholz, 2000| ). Six 
members of the project have performed this task. 
The results of the six systems (precision, recall 
and F / 3 = i can be found in table |[ Belz ( 2001 ) 
used Local Structural Context Grammars for find- 
ing chunks. Dejean ( 2000a ) applied the the- 



ory refinement system ALLiS to the shared task 
data. Koeling ( 2000| ) evaluated a maximum en- 
tropy learner while using different feature com- 
binations (ME). Osborne ( 2000b| ) used a maxi- 
mum entropy-based part-of-speech tagger for as- 
signing chunk tags to words (ME Tag). Thollard 



(2001) identified chunks with Finite State Trans- 



ducers generated by a probabilistic grammar algo- 
rithm (FST). Tjong Kim Sang ( |2000bD tested dif- 
ferent configurations of combined memory-based 
learners (MBL). The FST and the LSCG results 
are lower than those of the other systems because 
they were obtained without using lexical informa- 





precision 


recall 


F/3=l 


MBL 


94.04% 


91.00% 


92.50 


ALLiS 


91.87% 


92.31% 


92.09 


ME 


92.08% 


91.86% 


91.97 


ME Tag 


91.65% 


92.23% 


91.94 


LSCG 


87.97% 


88.17% 


88.07 


FST 


84.92% 


86.75% 


85.82 


combination 


93.68% 


92.98% 


93.33 


best 


93.45% 


93.51% 


93.48 


baseline 


72.58% 


82.14% 


77.07 





precision 


recall 


F/3=l 


MBL 


93.63% 


92.88% 


93.25 


ME 


93.20% 


93.00% 


93.10 


ALLiS 


92.49% 


92.69% 


92.59 


IGTree 


92.28% 


91.65% 


91.96 


C5.0 


89.59% 


90.66% 


90.12 


SOM 


89.29% 


89.73% 


89.51 


combination 


93.78% 


93.52% 


93.65 


best 


94.18% 


93.55% 


93.86 


baseline 


78.20% 


81.87% 


79.99 



Table 1 : The chunking results for the six systems 
associated with the project (shared task CoNLL- 
2000). The baseline results have been obtained 
by selecting the most frequent chunk tag associ- 
ated with each part-of-speech tag. The best results 
at CoNLL-2000 were obtained by Support Vector 
Machines. A majority vote of the six LCG sys- 
tems does not perform much worse than this best 
result. A majority vote of the five best systems 
outperforms the best result slightly (5% error re- 
duction). 

tion. The best result at the workshop was obtained 
with Support Vector Machines ( Kudoh and Mat- 
sumoto, 2000). 

Because there was no tuning data available for 
the systems, the only combination technique we 
could apply to the six project results was majority 
voting. We applied majority voting to the output 
of the six systems while using the same approach 
as Tjong Kim Sang ( 2000b| ): combining start and 
end positions of chunks separately and restoring 
the chunks from these results. The combined per- 
formance (F / g = i=93.33) was close to the best re- 
sult published at CoNLL-2000 (93.48). 

3.2 NP chunking 

The NP chunking task is the specialisation of the 
chunking task in which only base noun phrases 
need to be detected. Standard data sets for ma- 
chine learning approaches to this task were put 
forward by Ramshaw and Marcus (1995). Six 



project members have applied a total of seven 
different systems to this task, most of them in 
the context of the combination paper Tjong Kim 
Sang et al. ( 2000| ). Daelemans applied the de- 
cision tree learner C5.0 to the task. Dejean used 
the theory refinement system ALLiS for finding 



Table 2: The NP chunking results for six sys- 
tems associated with the project. The baseline 
results have been obtained by selecting the most 
frequent chunk tag associated with each part-of- 
speech tag. The best results for this task have 
been obtained with a combination of seven learn- 
ers, five of which were operated by project mem- 
bers. The combination of these five performances 
is not far off these best results. 



noun phrases in the data. Hammerton (|2001j) pre- 
dicted NP chunks with the connectionist methods 
based on self-organising maps (SOM). Koeling 
detected noun phrases with a maximum entropy- 
based learner (ME). Konstantopoulos ( 2000| ) used 
Inductive Logic Programming (ILP) techniques 
for finding NP chunks in unseen texts^. Tjong 
Kim Sang applied combinations of IB llG systems 
(MBL) and combinations of IGTree learners to 
this task. The results of the six of the seven sys- 
tems can be found in table ||. The results of C5.0 
and SOM are lower than the others because nei- 
ther of these systems used lexical information. 

For all of the systems except SOM we had tun- 
ing data and an extra development data set avail- 
able. We tested all ten combination methods on 
the development set and best-3 majority voting 
came out as the best (F^ = i = 93.30; it used the 
MBL, ME and ALLiS results). When we applied 
best-3 majority voting to the standard test set, we 
obtained F^ =1 = 93.65 which is close to the best 
result we know for this data set (F^ =1 = 93.86) 



(Tjong Kim Sang et al., 2000). The latter result 
was obtained by a combination of seven learning 
systems, five of which were operated by members 
of this project. 



3 Results are unavailable for the ILP approach. 





precision 


recall 


F/3=l 


MBL 
LSCG 

MDL 


90.00% 
80.04% 

53.2% 


78.38% 
80.25% 
68.7% 


83.79 
80.15 
59.9 


best 
baseline 


91.28% 

77.57% 


76.06% 

59.85% 


82.98 
67.56 



Table 3: The results for three systems associ- 
ated with the project for the NP bracketing task, 
the shared task at CoNLL-99. The baseline re- 
sults have been obtained by finding NP chunks in 
the text with an algorithm which selects the most 
frequent chunk tag associated with each part-of- 
speech tag. The best results at CoNLL-99 was 
obtained with a bottom-up memory-based learner. 
An improved version of that system (MBL) deliv- 
ered the best project result. The MDL results have 
been obtained on a different data set and therefore 
combination of the three systems was not feasible. 



The original Ramshaw and Marcus (|1995j) pub- 
lication evaluated their NP chunker on two data 
sets, the second holding a larger amount of train- 
ing data (Penn Treebank sections 02-21) while us- 



ing 00 as test data. Tjong Kim Sang ( |2000aD has 
applied a combination of memory-based learners 
to this data set and obtained Fg = i = 94.90, an im- 
provement on Ramshaw and Marcus's 93.3. 

3.3 NP bracketing 

Finding arbitrary noun phrases was the shared 
task of CoNLL-99, held in Bergen, Norway in 
1999. Three project members have performed this 
task. Belz ( 2001 ) extracted noun phrases with 
Local Structural Context Grammars, a variant of 



Data-Oriented Parsing (LSCG). Osborne ( |1999b| ) 
used a Definite Clause Grammar learner based on 
Minimum Description Length for finding noun 
phrases in samples of Penn Treebank material 



(MDL). Tjong Kim Sang (gOOOj) detected noun 
phrases with a bottom-up cascade of combina- 
tions of memory-based classifiers (MBL). The 
performance of the three systems can be found in 
table ||. For this task it was not possible to apply 
system combination to the output of the system. 
The MDL results have been obtained on a differ- 
ent data set and this left us with two remaining 
systems. A majority vote of the two will not im- 
prove on the best system and since there was no 



tuning data or development data available, other 
combination methods could not be applied. 

4 Prospects 

The project has proven to be successful in its re- 
sults for applying machine learning techniques 
to all three of its selected tasks: chunking, NP 
chunking and NP bracketing. We are looking for- 
ward to applying these techniques to other NLP 
tasks. Three of our project members will take part 
in the CoNLL-2001 shared task, 'clausing', hope- 
fully with good results. Two more have started 
working on the challenging task of full parsing, 
in particular by starting with a chunker and build- 
ing a bottom-up arbitrary phrase recogniser on top 
of that. The preliminary results are encouraging 
though not as good as advanced statistical parsers 



like those of Charniak (|2000j) and Collins (gOOOJ). 
It is fair to characterise LCG's goals as pri- 
marily technical in the sense that we sought to 
maximise performance rates, esp. the recognition 
of different levels of NP structure. Our view in 
the project is certainly broader, and most project 
members would include learning as one of the 
language processes one ought to study from a 
computational perspective — like parsing or gen- 
eration. This suggest several further avenues, e.g., 
one might compare the learning progress of sim- 
ulations to humans (mastery as a function of ex- 
perience). One might also be interested in the 
exact role of supervision, in the behaviour (and 
availability) of incremental learning algorithms, 
and also in comparing the simulation's error func- 
tions to those of human learners (wrt to phrase 
length or construction frequency or similarity). 
This would add an interesting cognitive perspec- 
tive to the work, along the lines begun by Brent 



( |1997[ ), but we note it here only as a prospect for 
future work. 
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